Back

British Journal of Ophthalmology

BMJ

Preprints posted in the last 30 days, ranked by how well they match British Journal of Ophthalmology's content profile, based on 13 papers previously published here. The average preprint has a 0.09% match score for this journal, so anything above that is already an above-average fit.

1
Interpretable machine-learning model for cataract associated factors identifying in patients with high myopia

Su, K.; Duan, Q.; He, W.; Wild, B.; Eils, R.; Lehmann, I.; Gu, L.; Zhu, X.

2026-02-27 ophthalmology 10.64898/2026.02.25.26347145
#1
155× avg
Show abstract

PurposeTo systematically evaluate ocular biometric and systemic laboratory factors associated with cataract in highly myopic eyes and to characterize potential nonlinear associations using an interpretable machine learning approach, thereby providing deeper mechanistic insights into the pathogenesis of highly myopic cataract. DesignA cross-sectional study encompassed 770 eyes of 594 patients with high myopia from Eye & ENT Hospital of Fudan University. SubjectsThe non-cataract control group included 458 eyes while the cataract group contained 312 eyes. MethodsDemographic traits, ocular biometric and systemic laboratory factors were gathered while features with over 30% of missing data were excluded. Composite indices were obtained through calculation. Multiple machine learning models were compared to investigate the association between features and highly myopic cataract, and the random forest (RF) model was chosen and fine-tuned. Feature selection was carried out by means of Shapley additive explanations (SHAP) and non-linear relationships were probed using SHAP dependence diagrams and confirmed with partial dependence plots. Main Outcome Measures(1) The Area Under the Curve (AUC) and other metrics of multiple machine learning models; (2) Top feature importance of the final simplified RF model; (3) Overall trends between features and highly myopic cataract; (4) Potential inflection points of top continuous features. ResultsA simplified fine-tuned RF model with 17 features reached stable discriminative performance, with a mean AUC of 0.762 (95%CI: [0.731, 0.794]) among 10 independent testing sets. Age and axial length (AL) turned out to be the most influential features which had non-linear relationships highly myopic cataract, with an inflection point seen around 65.75 (95%CI: [63.72, 67.79]) years for age and 30.55 (95% CI: [29.22, 31.88]) mm for axial length respectively, while the ratio of anterior chamber depth to axial length (ACD/AL) was associated with highly-myopic cataract in a U-shape. Ocular biometric factors were more strongly related to highly myopic cataract than systemic laboratory factors. ConclusionsOcular biometric factors, especially age, AL, and composite indices like ACD/AL, have strong and non-linear connections with highly myopic cataract. These results emphasize the significance of ocular structural arrangement in cataract within highly myopic eyes and indicate that interpretable data-driven methods could offer clinically relevant understandings regarding its phenotypic description.

2
Real-world utilization and initial experience with aflibercept-ayyh (PAVBLU(R)) for retinal disorders in United States retina practices: A descriptive retrospective analysis

Servin, A. E.; McFadden, I.; Esmaeilkhanian, H.; Holcomb, D.; Lin, J.; Awh, C. C.

2026-02-27 ophthalmology 10.64898/2026.02.25.26345681
Top 0.2%
143× avg
Show abstract

IntroductionAnti-vascular endothelial growth factor (anti-VEGF) therapies are standards of care for vision-threatening retinal diseases. This retrospective observational study describes demographics, utilization, best recorded visual acuity (BRVA), and safety among eyes with neovascular age-related macular degeneration (nAMD), diabetic retinopathy (DR), diabetic macular edema (DME), or retinal vein occlusion (RVO) treated with the biosimilar aflibercept-ayyh (PAVBLU(R)) in routine clinical practice. MethodsElectronic medical records from the Retina Consultants of America database of patients receiving aflibercept-ayyh (12/1/2024-10/31/2025) were analyzed, focusing on eyes with [&ge;]84 days of follow-up. The index date was the first documented aflibercept-ayyh injection. Postindex data were used to assess treatment patterns, BRVA (Wilcoxon signed rank test), and adverse events of special interest (AESIs). ResultsA total of 1,000 consecutive eyes from 989 patients received 3,730 injections of aflibercept-ayyh; most (91%) switched from prior anti-VEGF therapy and 9% were anti-VEGF treatment-naive. Disease distribution was 58% nAMD, 19% RVO, 16% DME, and 7% DR. Among switchers, median (IQR) number of prior injections was 21 (8-46). Median (IQR) follow-up was 6.0 months (4.6-7.1). Median (IQR) number of aflibercept-ayyh injections per eye was 4 (3-5). Among eyes with [&ge;]84 days of follow-up (n=889), mean BRVA expressed as logarithm of minimum angle of resolution (logMAR) remained stable for switchers (0.4 to 0.4; P=0.96) and improved from baseline in anti-VEGF-naive eyes (0.5 to 0.4; P<0.01). Confirmed AESIs included iritis (n=2; 0.05% of injections), with no events of vitreous cells, endophthalmitis, retinal detachment, retinal vasculitis, or vitreous hemorrhage. ConclusionIn this descriptive real-world analysis, aflibercept-ayyh was associated with stable visual acuity in previously treated eyes and vision improvement in treatment-naive eyes, with no new or unexpected safety findings, consistent with expectations for aflibercept. These findings add real-world experience to preexisting evidence demonstrating no clinically meaningful differences between aflibercept-ayyh (PAVBLU(R)) and reference aflibercept (EYLEA(R)). KEY SUMMARY POINTSO_ST_ABSWhy carry out this study?C_ST_ABSO_LIThe anti-vascular endothelial growth factor (VEGF) drug aflibercept, approved in 2011 and marketed in the United States as EYLEA(R),* has demonstrated efficacy in treating retinal diseases such as neovascular age-related macular degeneration (nAMD), diabetic retinopathy (DR), diabetic macular edema (DME), or retinal vein occlusion (RVO) and is a standard of care for these disorders. C_LIO_LIAflibercept-ayyh is a biosimilar to aflibercept that has demonstrated comparable efficacy and safety in the treatment of nAMD in a randomized controlled clinical trial. C_LIO_LIThis study describes the real-world use patterns, vision outcomes, and safety of aflibercept-ayyh in clinical settings in the United States for the treatment of nAMD, DR, DME, and RVO. C_LI What was learned from the study?O_LIIn this real-world study of 1,000 consecutive eyes treated with the biosimilar aflibercept-ayyh in patients with retinal diseases, we observed no new safety concerns and that aflibercept-ayyh maintained visual acuity in eyes switching anti-VEGF agents and improved vision in anti-VEGF-naive eyes, consistent with expected responses to aflibercept. C_LIO_LIThese findings support aflibercept-ayyh as a suitable treatment option when anti-VEGF therapy is indicated. *EYLEA(R) is a registered trademark of Regeneron Pharmaceuticals, Inc. PAVBLU(R) is a registered trademark of Amgen Inc. C_LI

3
Predicting visual function before glaucoma onset from baseline optical coherence tomography scans using deep learning

Chaurasia, A. K.; Wang, C.; Toohey, P. W.; Chen, C. Y.; MacGregor, S.; Bennett, M. T.; Verma, N.; Craig, J. E.; McCartney, P. J.; Sarossy, M. G.; Hewitt, A. W.

2026-03-02 ophthalmology 10.64898/2026.02.27.26347297
Top 0.3%
128× avg
Show abstract

BackgroundThe visual field (VF) test results of many eyes with glaucoma progress despite treatment. This suggests that some eyes are either untreated or that the management of intraocular pressure (IOP) does not influence the outcome. In this work, we explore whether future VF parameters can be predicted from a baseline optical coherence retinal nerve fibre layer (OCT-RNFL) scan using a deep learning model. MethodsThe model was developed using 1792 eyes from 1610 patients, and externally validated on 151 eyes from a second centre using the same Zeiss Cirrus machine and 281 eyes from a third centre using scans obtained from a different (Heidelberg Spectralis) machine. The Vision Transformers (ViT)-based regression model was trained on baseline OCT-RNFL scans to predict three key VF indices (follow-up interval: 4.74 {+/-} 2.59 years). Model performance was evaluated using Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), with 95% confidence intervals (CI). ResultsThe model achieved an overall MAE of 2.07 (95% CI: 1.91-2.22) and RMSE of 2.87 (95% CI: 2.60-3.14) on the internal validation set. On external validation, the model showed comparable performance with an MAE of 2.07 (95% CI: 1.8-2.35) for the external validation (Zeiss OCT) cohort and 2.11 (95% CI: 1.93-2.31) for the external validation (Heidelberg OCT) cohort. Saliency maps revealed that the inner and outer RNFL layers were key structures in driving the models predictions. ConclusionsOur ViT-based regression model effectively predicts key VF indices objectively from a single OCT-RNFL scan, with strong performance across two OCT devices, offering a novel tool for predicting glaucoma progression.

4
A large deletion spanning multiple enhancers near PITX2 increases primary open-angle glaucoma risk

Said, K.; Segre, A.; Wiggs, J. L.; Aboobakar, I. F.

2026-03-02 ophthalmology 10.64898/2026.02.26.25342774
Top 0.3%
121× avg
Show abstract

ImportanceGenome-wide association studies have identified hundreds of common single nucleotide polymorphisms (SNPs) and small insertions/deletions (indels) associated with primary open-angle glaucoma (POAG) risk, though these variants have modest effect sizes and individually may have minor contributions to disease development. As whole-genome sequencing data is becoming more readily available, structural variants and other complex genomic features can be interrogated for contribution to disease risk. ObjectiveTest the association of structural variants in known glaucoma loci with disease risk. DesignCross-sectional study. SettingA multicenter cohort of individuals from the United States who contributed genomic and electronic health record data to the All of Us Research Program. ParticipantsPOAG case/control cohorts were generated in the All of Us Researcher Workbench using age (>40 for cases, >65 for controls) and ICD 9/10 diagnosis codes. Main Outcomes and MeasuresLogistic regression analyses adjusted for age, sex, and the top 10 principal components of ancestry were used to test association of structural variants within 500 kilobases of 309 known open-angle glaucoma risk loci. The significance threshold after Bonferroni correction was set at p<1.6x10-4. Results516 POAG cases and 18,716 controls of European ancestry from the All of Us v8 data release were included in the analysis. Mean age was 77.0 years among cases and 74.7 years among controls. Females comprised 45.7% of cases and 56.5% of controls. An 8,732 base pair deletion upstream of PITX2 (chr4:110680827-110689558) was associated with 7.3-fold higher odds of POAG (95% confidence interval: 2.9-18.5, p= 2.4x10-5, variant carrier frequency= 1.6% in cases and 0.25% in controls). Functional annotation identified multiple enhancers overlapping the deletion, suggesting that this structural variant likely impacts gene regulation and expression. Conclusion and RelevanceWhole genome sequencing data captures rare structural variants with large effect sizes that are missed by conventional SNP and indel genotyping approaches, enabling improved POAG risk stratification. These data also expand the phenotypic spectrum of structural variation in the PITX2 locus from childhood glaucoma to adult-onset disease, where age at diagnosis and clinical severity may be influenced by the extent of disrupted regulatory elements.

5
Axial Length Matters: Scaling Effects in Retinal Fundus Image Analysis

Li, Q.; Harish, A. B.; Guo, H.; Leung, J. T.; Radhakrishnan, H.

2026-03-04 ophthalmology 10.64898/2026.03.03.26347501
Top 0.3%
119× avg
Show abstract

PurposeQuantitative metrics obtained from retinal fundus images (such as vessel length, tortuosity and other scale-dependent measures) are increasingly used as potential biomarkers for systemic diseases, including cardio- and neurovascular conditions. However, with the increasing prevalence of myopia and related axial growth, this study aims to evaluate if axial length scaling significantly alters the overall distributions of the inferred biomarkers when compared to biomarker data obtained without axial length scaling and if these effects can be corrected. Methods2,309 clinic visits from patients aged [&le;]21 years were analysed and extracted for axial-length scaling analysis (range) 20 to 28 mm). The retinal fundus photographs were automatically segmented using Automorph to extract biometric data, including vascular metrics. The parameters were further corrected for axial length using correction factors based on the Bennett-Littmann formula and true axial length. ResultsAxial length significantly influenced biometric parameters (vessel metrics) derived from fundus photography. The magnitude of error in diameter and length of blood vessels was approximately 4-5% for each 1 mm deviation from the reference axial length of 24 mm, whereas the error in vessel area was approximately 9-10% per 1 mm, consistent with the geometric expectation that area scales with the square of linear dimensions. The scaling corrections for different axial lengths are presented. ConclusionsAxial-length-related magnification introduces systematic bias into retinal vascular metrics from fundus photographs. Bennett-Littmann correction using true axial length reduces these errors and should be adopted in quantitative fundus imaging and Al biomarker development.

6
Remote Physiologic Monitoring and Principal Care Management for Chronic Retinal Diseases: Results from over 80,000 Encounters

Dhoot, S.; Boyer, D.; Avery, R.; Stoller, G.; Couvillion, S.; Ferrone, P.; Crane, P.; Ianchulev, T.; Chen, E. P.

2026-03-02 ophthalmology 10.64898/2026.02.27.26347265
Top 0.6%
85× avg
Show abstract

PurposeTimely detection of disease activity in chronic retinal diseases improves visual outcomes but is limited by the lack of validated systems for continuous monitoring and care management. We evaluated the real-world performance of an integrated remote physiologic monitoring and principal care management program (RemoniHealth(R)) using a self-administered multimodal retinal function test (Macustat(R)) for home monitoring. MethodsThis single-arm real-world intervention study was conducted across 33 retina practices. A total of 2,216 adults with chronic retinal diseases performed weekly home retinal function testing with integrated care management support. Primary endpoints included the annualized rate of disease progression detection, time to intervention after first flag, true positive rate, and patient adherence. Descriptive statistics and data analyses were analyzed using chi-square tests and Clopper-Pearson confidence intervals. ResultsParticipants contributed 82,644 encounters and 16,805 patient-months of monitoring. The program generated 241 alerts, including 101 Macustat flags and 135 care management prompts. Among 73 adjudicated flags, 56 were true positives and 17 false positives (PPV 76.7%). The annualized detection rate was 4 per 100 patient-years. Of confirmed events, 93% led to intravitreal injection or other major management change. Mean adherence was 72.1%, and patients with [&ge;]80% adherence had higher odds of true positivity. DiscussionThis RPM-PCM model achieved high engagement and meaningful detection of asymptomatic progression between visits, supporting the value of home monitoring for timely intervention. Translational RelevanceThese findings support scalable integration of home vision testing and care management into routine retinal practice to enable earlier intervention and improved continuity of care.

7
Rare Coding Variant Associations With Primary Open-Angle Glaucoma In African Ancestry:A Multi-Cohort Exome-Wide Meta Analysis

Ikuzwe Sindikubwabo, A. B. B.; Fan, Y.; Zhu, Y.; Caruth, L.; Salowe, R.; Zhao, B.; O'Brien, J.; Setia-Verma, S.

2026-02-27 ophthalmology 10.64898/2026.02.25.26347141
Top 0.6%
82× avg
Show abstract

Primary open-angle glaucoma (POAG) disproportionately affects individuals of African ancestry, yet rare coding variation in this population remains understudied. To address this gap, we performed a multi-cohort exome-wide meta-analysis across POAAGG, PMBB, All of Us, and UK Biobank, including 4,815 POAG cases and 22,922 controls of genetically inferred African ancestry. Although no gene reached exome-wide significance, we identified several suggestive gene-level associations driven by rare variants (minor allele frequency [&le;]0.1% or singletons),including signals in SRF, BLTP3A, METTL2A, and KRT10. Among these, SRF demonstrated the strongest association and was driven by rare missense variants with moderate effect sizes. Given its role in cytoskeletal organization and actin dynamics; processes central to trabecular meshwork function and intraocular pressure regulation SRF represents a biologically plausible candidate gene. Notably, these genes have not been previously highlighted in predominantly European ancestry POAG association studies, suggesting potential ancestry-specific rare variant contributions. Overall, our findings highlight the critical importance of investigating rare coding variation in POAG, in disproportionately affected populations to deepen understanding of POAG etiology and genetic risk.

8
Effects of morning and evening narrowband blue light and myopic defocus on axial length in humans

Thakur, S.; Khudkhudia, H.; Sankaridurg, P.; Verkicharla, P. K.

2026-03-04 ophthalmology 10.64898/2026.03.03.26347502
Top 0.6%
79× avg
Show abstract

PurposeTo investigate the effects of morning and evening narrowband blue light exposure on axial length, and to examine the short-term effect of morning blue light combined with myopic defocus on axial length. MethodsFor objective 1, 18 individuals underwent 60 minutes of narrowband blue light exposure (460nm) in the morning (9:00-11:00AM) and evening (5:00-7:00PM) of the same day. The axial length values were normalized to the average of the morning and evening axial length values. For objective 2, 27 young adults were exposed to 60 minutes of narrowband blue light and broadband white light while wearing a +3.00 D lens over the right eye. Axial length was measured using Lenstar LS900. ResultsA significant reduction in axial length was observed after exposure to morning blue light compared to evening blue light (-10.0{+/-}3.96{micro}m vs.-0.67{+/-}3.30{micro}m; p=0.02), whereas no such effect was observed with broadband white light exposure (0.0{+/-}3.53 {micro}m vs. -2.50{+/-}4.23{micro}m, p=0.70). While the broadband white light exposure did not alter the normal diurnal variation in axial length (+2.35{+/-}1.82{micro}m vs.-6.25{+/-}2.21{micro}m, p=0.04), blue light diminished such a pattern (-4.12{+/-}1.72{micro}m vs. - 2.00{+/-}2.00{micro}m, p=0.48). The myopic defocus did not influence axial length under either narrowband blue or broadband white light conditions. ConclusionThe short-term narrowband blue light exposure led to a significant decrease in axial length in the morning than evening exposure, with a likely influence on the diurnal rhythm of axial length. Morning blue light exposure with lens-induced myopic defocus did not provide additional short-term modulation of axial length.

9
From Blurry to Brilliant: HAGAN, a Hybrid Attention GAN for Home-Based OCT Image Enhancement with Magical Results

Arian, R.; Allen, E.; Tyler, M.; Kafieh, R.

2026-02-25 ophthalmology 10.64898/2026.02.23.26346915
Top 0.8%
58× avg
Show abstract

Regular optical coherence tomography (OCT) monitoring is essential for early detection of retinal disease and timely intervention, but frequent clinicbased imaging burdens patients and healthcare systems. Home-based OCT enables continuous monitoring and reduces clinic visits; however, compact optics and patient-operated acquisition introduce noise, reduced resolution, motion blur, and artifacts that limit clinical reliability and diagnostic confidence. To model home-based OCT acquisition, we employ simulated data reflecting images from Siloton, a compact home-based OCT device. Clinically realistic noise and acquisition artifacts were applied to high-quality OCT images using Silotons simulation software, generating near-real patient-operated scans. Building on this dataset, we propose HAGAN, a Hybrid Attention Generative Adversarial Network developed through a progressive strategy, evolving from a baseline U-Net to an adversarial framework with hybrid attention. The best-performing U-Net architecture, EfficientNet-B1, identified through evaluation and ablation studies, is adopted as the generator. The generator incorporates attention gates at its skip connections and self-attention modules within the decoder, and is paired with a VGG19-based discriminator to form the HAGAN architecture. The model is trained using a multiobjective loss combining pixel-wise, structural, perceptual, edge-preserving, and adversarial components. Experiments on simulated home-based OCT data demonstrate that HAGAN consistently outperforms baseline and state-of-the-art models across standard enhancement metrics and a clinically relevant retinal layer segmentation downstream task, improving visual quality and preservation of diagnostically meaningful anatomical structures. These findings support the potential of HAGAN for reliable enhancement in future home-based OCT platforms, enabling remote retinal monitoring and reducing reliance on in-clinic imaging and routine hospital visits. HighlightsO_LIEnhancing the quality of home-based OCT images to support remote retinal monitoring and reduce the need for frequent referrals to clinical imaging centers C_LIO_LIProposing HAGAN, a hybrid attention generative adversarial network for enhancing OCT images acquired using the Siloton home-based OCT device C_LIO_LIHybrid attention design combining attention gates and self-attention to preserve fine retinal details and global anatomical consistency C_LIO_LIAdversarial learning framework improving perceptual realism and preservation of diagnostically relevant retinal structures in low-quality homeacquired OCT images C_LIO_LIProgressive model development from baseline U-Net to hybrid attention GAN, demonstrating systematic and measurable performance improvements C_LIO_LIClinical relevance validated through downstream retinal layer segmentation, confirming preservation of diagnostically important structures C_LI

10
Abnormal Lipid Profiles as Markers of Diabetic Macular Edema Among Patients with Type 2 Diabetes Mellitus Attending a Tertiary Hospital in Northern Tanzania: A One-Year Cross-Sectional Study

HUUD, M.; MAKUPA, W.; MAKUPA, A.; DEOCAR, R.; SANDI, F.

2026-03-04 ophthalmology 10.64898/2026.03.03.26347512
Top 0.8%
54× avg
Show abstract

BackgroundDiabetes mellitus (DM) remains a major global health challenge and is associated with vision-threatening complications, including diabetic macular edema (DME), a leading cause of visual impairment. Dyslipidemia has been implicated in the development of macular edema through mechanisms involving vascular permeability, endothelial dysfunction, and chronic inflammation. However, evidence regarding the relationship between lipid abnormalities and macular edema remains inconsistent across studies. AimThis study aimed to evaluate the association between abnormal lipid profiles and diabetic macular edema among patients with type 2 diabetes mellitus attending Kilimanjaro Christian Medical Centre (KCMC). MethodsA hospital-based analytical cross-sectional study was conducted among 296 diabetic outpatients at KCMC. Participants underwent comprehensive ophthalmic evaluation including fundoscopy and imaging with optical coherence tomography (OCT) for assessment of macular edema. Blood samples were collected for biochemical lipid analysis. Data were cleaned and analyzed using STATA version 17. ResultsDiabetic macular edema was identified in 56.4% (167/296) of participants. Abnormal lipid parameters were common, with elevated total cholesterol observed in 48.6%, triglycerides in 43.6%, low-density lipoprotein (LDL) in 36.1%, and reduced high-density lipoprotein (HDL) in 38.9% of patients. Elevated total cholesterol, triglycerides, and LDL levels showed significant associations with macular edema (p < 0.05). After multivariable adjustment, serum triglycerides remained independently associated with macular edema (p = 0.002). ConclusionDyslipidemia demonstrated a significant association with diabetic macular edema, with serum triglycerides emerging as an independent predictor. These findings highlight the importance of lipid monitoring, lifestyle modification, and strengthened screening strategies in reducing the burden of vision-threatening diabetic complications.

11
Multimodal AI fuses proteomic and EHR data for rational prioritization of protein biomarkers in diabetic retinopathy

Lin, J. B.; Mataraso, S. J.; Chadha, M.; Velez, G.; Mruthyunjaya, P.; Aghaeepour, N.; Mahajan, V. B.

2026-02-24 ophthalmology 10.64898/2026.02.23.26346903
Top 0.9%
50× avg
Show abstract

PurposeThere is a need for novel therapies for diabetic retinopathy (DR) because existing therapies treat only certain features of DR and do not work optimally for all patients. While proteomic studies provide insight into disease pathobiology, they are often limited to small sample sizes due to high costs, limiting their generalizability and reproducibility. Moreover, they often yield lists of tens to hundreds of proteins with differential expression, making it difficult to prioritize the most biologically relevant biomarkers beyond using arbitrary fold-change and false-detection rate cutoffs. Here, we applied a two-stage multimodal AI approach: first, we integrated EHR and proteomics data to rationally prioritize candidate protein biomarkers and, next, validated these biomarkers in an independent cohort. These protein biomarkers of DR are rooted in the EHR data and thereby more likely to be biological drivers of disease. MethodsWe obtained EHR data from a large number of patients with and without DR (N=319,997) from the STARR-OMOP database and obtained aqueous humor liquid biopsies from a subset of these patients (N=101) for high-resolution proteomic profiling. We developed Clinical and Omics Multi-Modal Analysis Enhanced with Transfer Learning (COMET) to perform integrated analysis of proteomics and all available EHR data to identify protein biomarkers of DR. The model was trained in two phases: first, it was pretrained using patients with EHR data alone (N=319,896), and then, it was fine tuned using patients with both EHR and proteomics data (N=101), allowing it to learn both clinical and molecular features associated with DR. Findings from COMET were then validated with liquid biopsies from an independent, validation cohort (N=164). Resultst-distributed stochastic neighbor embedding (t-SNE) analysis of EHR and proteomics data identified proteins clustering with related EHR features. Levels of STX3 and NOTCH2, proteins involved in retinal function, were correlated with a diagnosis of macular edema, a record of a visual field exam, and a prescription for latanoprost, highlighting protein-EHR alignment. The pretrained, multimodal COMET model was superior (AUROC=0.98, AUPRC=0.91) compared to models generated using either EHR or proteomics data alone or without pretraining (AUROC: 0.76 to 0.92; AUPRC: 0.47 to 0.74). The proteins SERPINE1, QPCT, AKR1C2, IL2RB, and SRSF6 were prioritized by the COMET model compared to the models without pretraining, supporting their potential role in DR pathobiology, and were subsequently validated in an independent cohort. ConclusionWe used multimodal AI to prioritize protein biomarkers of DR that are most strongly linked to EHR elements, as well as identifying other protein biomarkers associated with disease features like diabetic macular edema. These findings serve as a foundation for future mechanistic studies and highlight the synergistic value of using multimodal AI to fuse EHR and proteomics data for enhanced proteomics analysis.

12
CausalFund: Causality-Inspired Domain Generalization in Retinal Fundus Imaging for Low-Resource Screening

Shi, M.; Zheng, H.; Gottumukkala, R.; Jonathan, N.; Armstong, G. W.; Shen, L. Q.; Wang, M.

2026-03-03 ophthalmology 10.64898/2026.03.02.26347127
Top 1.0%
28× avg
Show abstract

Early screening for glaucoma and diabetic retinopathy (DR) is critical to prevent irreversible vision loss, yet remains inaccessible to many underserved populations. However, AI models trained on hospital-grade fundus images often generalize poorly to low-cost images acquired with portable devices such as smartphones. We proposed CausalFund, a causality-inspired learning framework for training AI models that enable reliable low-resource screening from easily acquired non-clinical images. CausalFund disentangles disease-relevant retinal features from spurious image factors to achieve domain-generalizable screening across clinical and non-clinical settings. We integrated CausalFund with seven deep learning backbones for glaucoma and DR screening from portable-device fundus images, including lightweight architectures suitable for on-device deployment. Across diverse experimental settings and image quality conditions, CausalFund consistently improved AUC and achieved a more favorable sensitivity-specificity trade-off than conventional deep learning baselines. As a model-agnostic framework, CausalFund could be extended to other diseases and low-resourced scenarios characterized by degraded or non-standard imaging.

13
Are low ergothioneine levels a risk factor for age-related macular degeneration and other ocular disorders?

Cheah, I. K.; Fong, Z.; Chen, L.; Tang, R. M. Y.; Zhou, L.; Yanagi, Y.; Cheng, C. Y.; Su, X.; Li, X.; Teo, K. Y. C.; Cheung, C. M. G.; Tan, T.-E.; Halliwell, B.

2026-03-02 ophthalmology 10.64898/2026.02.27.26347162
Top 1%
19× avg
Show abstract

Age-related macular degeneration (AMD) is a leading cause of irreversible vision loss in ageing populations, with oxidative stress recognised as a key pathogenic driver. The dietary antioxidant and cytoprotectant, L-ergothioneine (ET), is avidly accumulated in many tissues, especially the eye. However its relationship to AMD has not been investigated. Here, we examined ETs distribution in ocular tissue and assessed circulating and intraocular ET levels in patients with neovascular AMD. Compared with ocularly-normal age-matched individuals, AMD patients exhibited significantly lower serum ET; elevated levels of ET metabolites, hercynine and ETSO, which may be generated by oxidative stress; and elevated levels of serum allantoin, a product of oxidative damage to urate in humans. Levels of ET in aqueous humour in AMD patients were marginally lower than cataractous patients who are already known to have significantly lower ET levels than healthy eyes. High ET levels were seen in human ocular tissues concentrating in regions vulnerable to oxidative injury, including the lens, retina, retinal pigment epithelium, and choroid, supporting a physiological protective role of ET in the eye. These findings identify the strong association between low ET levels and AMD, warranting further studies to determine whether ET supplementation can modify AMD risk or progression.

14
Signal change of cerebrospinal fluid with eye drops of O-17-labeled saline

Miyata, M.; Tomiyasu, M.; Sahara, Y.; Tsuchiya, H.; Maeda, T.; Tomoyori, N.; Kawashima, M.; Kishimoto, R.; Mizota, A.; Kudo, K.; Obata, T.

2026-02-17 radiology and imaging 10.64898/2026.02.12.26346215
Top 1%
7.9× avg
Show abstract

PurposeAqueous humor drains fluid from the eye not only via the conventional pathway through the trabecular meshwork and Schlemms canal, but also within the eye is known to occur via pathways through the posterior chamber and optic nerve to the cerebrospinal fluid (CSF) surrounding the optic nerve. The mechanism is poorly understood, and non-invasive method for evaluation in living humans has not been established. We previously showed that eye drops containing O-17-labeled water (H217O) distribute in the anterior chamber but not the vitreous. This study aimed to evaluate the distribution of H217O in the CSF along the optic nerve. MethodsFive ophthalmologically normal participants (20-31 years, all females) were selected from a previous prospective study based on 1H MR images of the eyes that included the optic nerve. They received eye drops of 10 mol% H217O in their right eye. Dynamic image time series was created by normalizing the signal of each 1H-T2WI by the pre-drop average signal. Region-of-interest analyses were performed for signal changes in the anterior chamber, vitreous, and CSF. ResultsIn the quantitative evaluation, the normalized intensity in the anterior chamber and CSF was significantly lower than that in the pre-drop signal (anterior chamber: 0.78 {+/-} 0.07, p < 0.005; CSF: 0.89 {+/-} 0.07, p < 0.05). No distribution was identified in the vitreous. Qualitatively, the distribution of H217O in the anterior chamber was detected in all five participants and in the CSF of four participants (80%). ConclusionH217O eye drops were distributed in the anterior chamber and CSF, but not in the vitreous. These findings suggest that the visualization of aqueous humor outflow, not via the Schlemms canal, may contribute to ocular fluid homeostasis, including the ocular glymphatic system.

15
Onco-Shikshak: An AI-Native Adaptive Learning Ecosystem for Medical Oncology Education

Makani, A.

2026-02-26 oncology 10.64898/2026.02.23.26346944
Top 1%
7.8× avg
Show abstract

Medical oncology education faces a dual crisis: knowledge velocity that outpaces static curricula and large language model (LLM) risks--hallucination and automation bias--that threaten the fidelity of AI-assisted learning. We present Onco-Shikshak V7, an AI-native adaptive learning platform that addresses both challenges through a unified cognitive architecture grounded in learning science. The system replaces isolated educational modules with four authentic clinical workflows--Morning Report, Tumor Board, Clinic Day, and AI Textbook--each scaffolded by a nine-module pedagogy engine that integrates ACT-R activation dynamics (illness scripts), Item Response Theory (adaptive difficulty), the Free Spaced Repetition Scheduler (FSRS v4), Zone of Proximal Development (scaffolding), and metacognitive calibration training (Brier score). Six specialist AI agents--medical oncology, radiation oncology, surgical oncology, pathology, radiology, and oncology navigation--engage in multi-disciplinary deliberation with per-specialty retrieval-augmented generation (RAG) grounding across nine authoritative guideline sources including NCCN, ESMO, and ASTRO. The platform provides 18 clinical cases with decision trees across six cancer types, maps every interaction to 13 ACGME Hematology-Oncology milestones, and implements four closed-loop feedback mechanisms that connect session errors to targeted flashcards, weak domains to suggested cases, and all interactions to a persistent learner profile. Technical validation confirms algorithmic correctness across eight subsystems. To our knowledge, this is the first system to unify ACT-R, IRT, FSRS, ZPD, and metacognitive calibration in a single medical education platform. Formal learner evaluation via randomized controlled trial is planned.

16
Can AI Match Human Experts? Evaluating LLM-Generated Feedback on Resident Scholarly Projects

van Allen, Z.; Forgues-Martel, S.; Venables, M. J.; Ghanney, Y.; Villeneuve, A.; Dongmo, J.; Ahmed, M.; Archibald, D.; Jolin-Dahel, K.

2026-03-04 medical education 10.64898/2026.03.04.26346878
Top 1%
7.6× avg
Show abstract

BackgroundDelivering timely, high-quality feedback on resident scholarly projects is labour-intensive, especially in large programmes. We developed an AI-assisted evaluation system, powered by the open-weight LLaMA-3.1 large-language model (LLM), to generate formative feedback on Family Medicine residents scholarly projects and compared its performance with expert human evaluators. MethodsWe evaluated whether the AI-generated feedback achieves comparable quality to expert feedback. The tool ingests heterogeneous resident submissions (PDFs, scans, photographs) via OCR and produces section-by-section feedback aligned with programme rubrics. In a three-phase study we evaluated 240 feedback reports (Short, Question and Timeline, Final; n = 80 each). Within each phase, 40 reports were AI-generated and 40 produced by research experts across four project types: Quality Improvement, Survey-Based, Research, and Literature Review. Blinded raters used a 25-item survey across five constructs: understanding & reasoning, trust & confidence, quality of information, expression style & persona, safety & harm. ResultsSurvey reliability was high across phases ( = .71-.98). Human feedback generally out-scored AI. In short reports, humans led on quality (Mean {+/-} SD; 4.14 {+/-} 0.57 vs 3.09 {+/-} 1.05) and trust (3.96 {+/-} 0.71 vs 2.78 {+/-} 1.15). In final reports, differences become small for quality (4.09 {+/-} 0.65 vs 3.49 {+/-} 0.68) and persona (4.16 {+/-} 0.40 vs 3.91 {+/-} 0.50), while AI was preferred for safety (4.50 {+/-} 0.60 vs 4.36 {+/-} 0.56). Performance varied by project type: in survey-based final reports the AI led on quality (4.28 {+/-} 0.50 vs 3.98 {+/-} 0.44) and safety (4.58 {+/-} 0.40 vs 4.24 {+/-} 0.67), whereas in quality-improvement short reports humans were markedly superior in reasoning (4.27 {+/-} 0.68 vs 2.33 {+/-} 1.00). ConclusionsAn open-weight LLM with curated prompts can generate rubric-aligned feedback at scale that approaches the quality of expert human feedback. While expert feedback remained superior overall, AI surpassed humans in selected contexts and safety assessments. Performance of the tool will increase over time as newer and more capable open-weight models are released. Our code and systems prompts are open source.

17
Prompting is All You Need: How to Make LLMs More Helpful for Clinical Decision Support

Dymm, B.; Goldenholz, D. M.

2026-02-22 neurology 10.64898/2026.02.12.26346005
Top 1%
4.9× avg
Show abstract

ImportanceLarge language models (LLMs) offer potential decision support, but their accuracy varies. Prompt engineering can generally enhance LLM behavior in a clinical context, yet best practices have yet to be formally explored in realistic neurology settings. ObjectiveTo evaluate the impact of structured prompting versus simple prompting on the performance of six LLMs (three closed-source: OpenAI GPT-4o, OpenAI o3, OpenAI GPT-5.2 Thinking; three open-source: Meta Llama-4-Scout-17B-16E-Instruct, Llama-3.3-70B-Instruct-Turbo, and the reasoning model R1-1776) for thrombolytic clinical decision support (CDS) in acute stroke. DesignModels responded to three novel ischemic stroke vignettes using either a simple question ("Should this patient be offered thrombolytics?") or a five-step structured prompt (CARDS) guiding information extraction, timing analysis, contraindication checking, decision process explanation, and risk-benefit discussion. Outputs were assessed across seven domains: guideline adherence, unsafe recommendations, risk recognition, guideline grading accuracy, inclusion of conversational explanation, clarity, and overall helpfulness. ResultsStructured prompts significantly enhanced performance across most domains, with varying effects between model families. For some closed-source models (GPT-4o, o3), prompts structured in the CARDS style improved guideline adherence from 83.3% to 100%, eliminated unsafe recommendations (16.7% to 0%), and increased specific guideline grading accuracy from 0% to 100%. The closed-source reasoning model GPT-5.2 Thinking similarly achieved 100% adherence, 0% unsafe recommendations, and 100% grading accuracy with structured prompts, while also maintaining perfect safety and risk recognition under simple prompting. Similarly, the open-source reasoning model R1-1776 achieved these top-tier outcomes (100% adherence, 0% unsafe, 100% grading, 100% conversation) when structured prompts were applied, with grading and conversation improving from 0%. In contrast, other open-source models (Llama-4-Scout, Llama-3.3-70B) showed more modest gains: risk recognition improved (83.3% to 100%) and guideline grading accuracy increased (0% to 66.7%), while guideline adherence (66.7%) and unsafe recommendations (33.3%) persisted. Overall, structured prompting yielded the largest improvements in guideline grading accuracy and conversational reasoning across multiple models. ConclusionStructured prompting substantially enhances LLM performance for acute stroke thrombolysis CDS. Notably, some models, including the proprietary GPT-4o, o3, and GPT-5.2 Thinking, and the open-source reasoning model R1-1776, achieved excellent safety and adherence with structured prompts. For clinical deployment of any LLM, structured prompts are crucial, and vigilant human oversight remains essential.

18
Boards-style benchmarks overestimate prior-chat bias in large language models: a factorial evaluation study

Stanwyck, C.; Adibi, A.; Dozie-Nnamah, P.; Alsentzer, E.

2026-02-14 health informatics 10.64898/2026.02.12.26346164
Top 1%
4.9× avg
Show abstract

BackgroundLarge language models (LLMs) are increasingly piloted as chat interfaces for chart review and clinical decision support. Although leading models achieve and even exceed physician-level accuracy on exam-style benchmarks such as MedQA, recent perturbation studies show large drops in accuracy after small changes to prompts, distractor content, or answer format. Prior work has not systematically examined how these vulnerabilities unintentionally manifest in clinically realistic settings, including multi-turn chatbot interactions, free-text response formats, and tasks involving patient medical records. MethodsWe evaluated susceptibility to bias from prior chat messages across 14 LLMs (10 closed-source, 4 open-source) on two medical question-answering tasks: a boards-style benchmark (1000 MedQA test questions) and an electronic health record (EHR) information retrieval task (962 EHRNoteQA questions about real patient discharge summaries). Using a factorial design, we independently varied the presence and type of prior-chat distractors and response format across these two tasks. Distractors ranged from simple statements of incorrect answers to more realistic conversational exchanges between user and model, including interactions referencing a different patient. FindingsPrior-chat distractors produced large and consistent accuracy decrements in the MedQA multiple-choice setting, particularly when the prior message stated an incorrect answer. In this setting, insertion of this user message led to significant accuracy decreases in 13 of 14 models, with drops averaging 15{middle dot}0 percentage points across models. Effects were smaller for more plausible, conversational distractors and in free-response formats. In contrast, prior-chat bias in the discharge summary-based task was modest and inconsistent. Average accuracy decreases were under 2 percentage points across all distractor types and response formats assessed, with significant effects observed in a minority of models. InterpretationLLM performance can be biased toward incorrect answers by plausible prior-chat distractors, but these effects are highly context-dependent. We find that distraction effects are common and often substantial in the boards-style multiple-choice task, particularly when the distractor is an explicit (and unrealistic) prior message containing an incorrect answer. In contrast, these effects are markedly attenuated when the same questions are posed in free-response format and the distractor is incorporated into a clinically-realistic user-model exchange in the chat history, or when the task is switched from a boards-style vignette to a question about a real (de-identified) patient record. Taken together, these results suggest that evaluations based solely on single-turn, boards-style multiple-choice questions with unrealistic distractors may overstate the impact of prior-chat bias. These findings highlight the need to assess LLM behavior in multi-turn settings involving realistic clinical use cases, rather than relying on boards-style benchmarks for assessment of safety risks.

19
Monte Carlo Committee Simulation with Large Language Models for Predicting Drug Reimbursement Recommendations and Conditions: A Novel Neurosymbolic AI Approach

Janoudi, G.; Rada (Uzun), m.; Yasinov, E.; Richter, T.

2026-03-03 health policy 10.64898/2026.03.02.26347434
Top 1%
4.7× avg
Show abstract

BackgroundHealth technology assessment (HTA) agencies issue reimbursement recommendations that determine patient access to new therapies. Predicting these outcomes would enable sponsors to optimize market access strategies and health systems to anticipate budget impacts. However, traditional machine learning approaches require extensive manual feature extraction and predict only categorical outcomes, not the specific conditions attached to recommendations. MethodsWe developed Monte Carlo Committee Simulation, a neurosymbolic system that simulates multi-panelist deliberation using 14 persona-conditioned large language model panelists with weighted voting and uncertainty quantification. We conducted a temporal external validation study on CDA-AMC (Canadas Drug Agency) sponsor-submitted recommendations published between October 2024 and December 2025 (n=67), after the knowledge cutoff of the underlying models, ensuring predictions reflected reasoning rather than memorization. The system predicted both recommendation category (Reimburse with Conditions, Do Not Reimburse) and five condition categories (Population Restrictions, Prescriber/Setting Requirements, Continuation Conditions, Economic Conditions, Evidence Conditions). ResultsOn submissions where the system expressed confidence (n=44), recommendation prediction achieved 93.2% accuracy (95% CI: 84.1-100.0%), exceeding the 91.8% (95% CI: 83.7-98.0%) majority class baseline. The system demonstrated superior discrimination versus chance level (AUROC 0.817, 95% CI: 0.45-0.99, vs 0.500) and calibrated confidence estimates (ECE = 0.091). Pre-specified Strength of Mandate stratified accuracy from 96.8% (High, 95% CI: 90.3-100.0%) to 40.0% (Weak, 95% CI: 0.0-80.0%), with 83.3% of errors occurring in cases flagged as uncertain (p=0.0025). Analysis of the 5 abstained cases confirmed 40.0% accuracy, validating the systems identification of uncertain predictions. For condition prediction, the system achieved 48.8% subset accuracy, requiring correct simultaneous prediction of all 5 condition categories (25 = 32 possible combinations), and 86.3% Hamming accuracy versus 25.8% for a no-conditions baseline. Per-category accuracy ranged from 68.3% (Continuation Conditions) to 97.6% (Economic Conditions), with Continuation Conditions demonstrating the strongest discriminative ability (AUROC 0.896, 95% CI: 0.79-0.98). ConclusionsMonte Carlo Committee Simulation enables a shift from reactive to proactive market access: anticipating specific reimbursement conditions before committee review, with calibrated confidence that identifies which predictions to trust. Validated on temporally separated data the models could not have memorized, the system can be positioned as a forecasting aid that complements rather than replaces human deliberation.

20
Linguistic Effects of Ambient AI on Clinical Documentation: A Matched Pre-Post Study

Li, Y.; Zhou, H.; Blackley, S.; Plasek, J. M.; Lyu, Z.; Zhang, W.; You, J.; Centi, A.; Mishuris, R.; Yang, J.; Zhou, L.

2026-02-17 health informatics 10.64898/2026.02.16.26346370
Top 1%
4.5× avg
Show abstract

Ambient intelligence-based systems are increasingly used for clinical documentation. To quantify linguistic differences associated with ambient documentation, we conducted a matched pre-post analysis of 6,026 outpatient clinical notes from Mass General Brigham following implementation of two ambient AI documentation systems (Nuance Dragon Ambient eXperience [DAX] and Abridge). Within-clinician comparisons focused on the History of Present Illness (HPI) and Assessment and Plan (A&P) sections and evaluated syntactic complexity, lexical ambiguity, linguistic variability, discourse coherence, and readability. Manual review of 50 paired notes was performed to validate findings from automated linguistic analyses. Our analyses indicate that the linguistic effects of ambient documentation are both vendor-dependent and section-specific. Across both vendors, ambient notes in HPI were longer and exhibited greater syntactic complexity (longer sentences and clauses, increased dependency distance), lower lexical ambiguity, lower language-model perplexity, and higher local and global discourse coherence. These findings indicate that ambient systems systematically restructure conversational input into more syntactically elaborated and linguistically predictable narratives, reflecting increased standardization relative to both general-domain and biomedical language models. In contrast, changes in A&P were smaller and more heterogeneous, consistent with its more structured/templated nature. Readability analyses further showed increased length and lexical complexity in ambient HPI, whereas A&P readability differences were minimal. Overall, our findings demonstrate that ambient documentation changes how clinical information is linguistically expressed and organized, with effects varying by note section, vendor, and provider role/specialty. Evaluation should therefore extend beyond efficiency to consider effects on communication, cognitive load, clinical inference, and downstream analytics.